Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Code completion problems are an effective type of formative assessment; especially, when used to practice newly learned concepts or topics. While there is a growing body of research in computing education on the use of large language models (LLMs) to support learning content development, the use of LLMs for producing high-quality code completion problems has not yet been explored. In this paper, we analyze the capability of LLMs to generate effective distractors (i.e., plausible but incorrect options) and explanations for completion problems. We utilize common student misconceptions to improve the quality of the generated distractors. Our study suggests that LLMs are capable of generating reasonable distractors and explanations. At the same time, we identify a lack of a sufficiently granular taxonomy of common student misconceptions that would be needed for aligning the generated distractors with the common misconceptions and errors -- a gap that should be addressed in future work.more » « lessFree, publicly-accessible full text available May 14, 2026
-
As large language models (LLMs) show great promise in generating a wide spectrum of educational materials, robust yet cost-effective assessment of the quality and effectiveness of such materials becomes an important challenge. Traditional approaches, including expert-based quality assessment and student-centered evaluation, are resource-consuming, and do not scale efficiently. In this work, we explored the use of pre-existing student learning data as a promising approach to evaluate LLM-generated learning materials. Specifically, we used a dataset where students were completing the program construction challenges by picking the correct answers among human-authored distractors to evaluate the quality of LLM-generated distractors for the same challenges. The dataset included responses from 1,071 students across 22 classes taught from Fall 2017 to Spring 2023. We evaluated five prominent LLMs (OpenAI-o1, GPT-4, GPT-4o, GPT-4o-mini, and Llama-3.1-8b) across three different prompts to see which combinations result in more effective distractors, i.e., those that are plausible (often picked by students), and potentially based on common misconceptions. Our results suggest that GPT-4o was the most effective model, matching close to 50% of the functional distractors originally authored by humans. At the same time, all of the evaluated LLMs generated many novel distractors, i.e., those that did not match the pre-existing human-authored ones. Our preliminary analysis shows that those appear to be promising. Establishing their effectiveness in real-world classroom settings is left for future work.more » « lessFree, publicly-accessible full text available March 3, 2026
-
Worked examples are among the most popular types of learning content in programming classes. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generate a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.more » « less
-
This paper presents a comparison of two instructional strategies meant to help learners better comprehend code and learn programming concepts: reading code examples annotated with expert explanation (worked-out examples) versus scaffolded self-explanation of code examples using an automated system (Intelligent Tutoring System). A randomized controlled trial study was conducted with 90 university students who were assigned to either the control group (reading worked-out examples, a passive strategy) or the experimental group where participants were asked to self-explain and received help, if needed, in the form of questions from the tutoring system( scaffolded self-explanation, an interactive strategy). We found that students with low prior knowledge in the experimental condition had significantly higher learning gains than students with high prior knowledge. However, in the control condition, this distinction in learning outcomes based on prior knowledge was not observed. We also analyzed the effect of self-efficacy on learning gains and the nature of self-explanation. Low self-efficacy students learn almost twice as much in the interactive condition versus the passive condition although the difference was not significant probably because of low sample size. We also found that high self-efficacy students tend to provide more relational explanations whereas low self-efficacy students provide more multi-structural or line-by-line explanations.more » « less
An official website of the United States government
